How GPUs and High-Performance Computing Can Augment Big Data
In this article
Recently, the WWT Big Data Practice has been researching what is next for the big data space. We are all well informed about the pros and cons of Hadoop, Spark, NoSQL, etc., but what will be the next technology that makes these tools even faster, better, cheaper? Can other areas of computer science be leveraged with the current tools to accelerate functionality? Will there be a technology that completely disrupts this space?
Bringing elements of High Performance Computing (HPC) to the big data field has huge potential for disruption. A crucial component of an HPC system that differentiates itself from a big data system is the many-core processors such as the NVIDIA GPU or Intel Xeon Phi processor. Understanding how to leverage these processors in the field of big data could provide the horsepower needed for the complex machine learning algorithms being introduced into the enterprise. This post will focus on the particulars of the NVIDIA GPU.
Overview of GPUs
Several years ago NVIDIA realized that their GPUs, which were typically used to render images and videos, could be repurposed to help solve computationally intensive mathematical problems. Moreover, they could do it more cost effectively than traditional supercomputing on CPUs. Because of this, they developed the general purpose GPU and a programming language called CUDA for scientists to write code that interacts directly with the GPU. Now, some of the most powerful super computers in the world such as Titan at the Oak Ridge National Laboratory and Tianhe-1 at the National Supercomputing Center in Tianjin, China are built using GPUs.
More recently, the machine learning community has started embracing GPU-based systems for some of their most challenging problems. In particular, the training of deep learning algorithms has had some fantastic success stories. Training these algorithms can traditionally take weeks or even months on CPU-based systems. On GPU-based systems, however, they are now able to train in days or even hours. Large research groups at Google, Baidu, Yahoo, Microsoft, and Facebook are doing amazing work in the fields of computer vision and speech recognition by leveraging GPUs and deep learning (see the keynotes from Andrew Ng at Baidu and Jeff Dean from Google for more information).
How does a GPU differ physically from a CPU? If you take a look at Figure 1, you can get an idea of how the different elements of a processor are distributed on a CPU vs. a GPU. The green Algorithmic Logic Units (ALUs) are where the computations occur. On a CPU there are typically only a handful of ALUs, and a majority of the transistors are used for data caching and flow control. On a GPU, however, a large portion of the transistors is ALUs, devoted to data processing. In fact, one nVIDIA Tesla K80 has 4992 CUDA cores for processing! These ALUs are simple, data-parallel, multi-threaded cores that offer high computing power and a large memory bandwidth, all with very low power consumption.
The GPU devotes more transistors to data processing (Source: AllegroViva)
By design, a GPU will be able to process data several times faster than any CPU as shown in Figure 2, however there are some limitations. Because of the design, sequential serial processing is less effective on a GPU than a CPU. In addition, developing algorithms for GPUs is complicated and requires sophisticated low-level programming. There are some algorithms that just cannot be parallelized due to the data interdependencies inherent in the algorithm. Therefore, a heterogeneous system with both CPUs and GPUs can provide the best-of-both-worlds, sequential serial processing and highly parallelized processing.
Processing power and memory bandwidth comparison between GPUs and CPUs (Source: Presentation by Steve Oberlin at GPU Technology Conference 2015)
How HPC and Big Data have been kept separate in the past
Until recently, the fields of Big Data and HPC have been kept somewhat separate. The use of HPC technology has typically been an expensive endeavor only pursued by select high-end scientific research groups and financial institutions. Unlike Big Data, where systems are measured by terabytes and petabytes of storage, the HPC community is primarily concerned with teraFLOPS and petaFLOPS of computational performance.
HPC systems are purpose built to parallelize and process complex computational algorithms. These algorithms are used in fields such as nuclear physics, molecular modeling, and weather forecasting where mathematical functions can be broken up into many small tasks and distributed across the massive amount of cores available in an HPC system.
Big Data systems, on the other hand, are purpose built to handle data intensive applications. They were born out of the early big Internet companies, such as Google and Yahoo, in order to store and trawl every piece of information on the Internet, and have that data available at all times for customers to use. Overall, the major issues these Internet companies faced were scalability, reliability, and availability, not so much computational power.
How HPC and Big Data can complement each other now
Now, as large-scale machine learning and streaming start to play a larger role in the enterprise, the Big Data systems are in need of more computational capabilities. Tools like Spark are making streaming and machine learning algorithms more powerful in Hadoop, but they will always be limited by the relatively low core count available in the CPUs on each node. This is where many-core devices, such as NVIDIA's General Purpose Graphical Processing Units (GPUs) processors, are able to help.
Imagine a heterogeneous system of GPUs and CPUs that streams data in near real-time through a GPU-based system that can perform on-the-fly processing and calculations for any quick decisioning. The data can be kept in memory for a short period of time so analysts can interact with the data extremely fast and perform calculations and transformations in a snap. As the data grows and gets older, it can be persisted into a Hadoop cluster where Spark can be used for bigger batch processing on years of historical data. If a model needs to be trained on this historical set, the GPUs can once again be leveraged for fast training of many models to allow data scientists to be more productive in their machine learning efforts.
While this is just in the beginning stages, we at WWT believe a system of this kind is the next phase of Big Data analytics. We are excited to be working on such cutting-edge technologies in order to stay ahead of the curve. We believe this will allow us to best service our customers when they come to us for guidance on their mission-critical big data initiatives.